Learning Universal Adversarial Perturbations with Generative Models

نویسندگان

  • Jamie Hayes
  • George Danezis
چکیده

Neural networks are known to be vulnerable to adversarial examples, inputs that have been intentionally perturbed to remain visually similar to the source input, but cause a misclassification. It was recently shown that given a dataset and classifier, there exists so called universal adversarial perturbations, a single perturbation that causes a misclassification when applied to any input. In this work, we introduce universal adversarial networks, a generative network that is capable of fooling a target classifier when it’s generated output is added to a clean sample from a dataset. We show that this technique improves on known universal adversarial attacks. I . I N T R O D U C T I O N Machine Learning models are increasingly relied upon for safety and business critical tasks such as in medicine [22], [30], [41], robotics and automotive [28], [32], [40], security [2], [17], [38] and financial [13], [18], [36] applications. Recent research shows that machine learning models trained on entirely uncorrupted data, are still vulnerable to adversarial examples [7], [12], [23], [24], [35], [37]: samples that have been maliciously altered so as to be misclassified by a target model while appearing unaltered to the human eye. Most work has focused on generating perturbations that cause a specific input to be misclassified, however, it has been shown that adversarial perturbations generalize across many inputs [35]. Moosavi-Dezfooli et al. [19] showed, in the most extreme case, that given a target model and a dataset, it is possible to construct a single perturbation that when applied to any input, will cause a misclassification with high likelihood. These are referred to as universal adversarial perturbations (UAPs). In this work, we study the capacity for generative models to learn to craft UAPs on image datasets, we refer to these networks as universal adversarial networks (UANs). We show that a UAN is able to sample from noise and generate a perturbation such that when applied to any input from the dataset, it will result in a misclassification in the target model. Furthermore, we show perturbations produced by UANs: improve on state-ofthe-art methods for crafting UAPs (Section IV-A), have robust transferable properties (Section IV-D), and reduce the success of recently proposed defenses [1] (Section V). I I . B A C K G R O U N D We define adversarial examples and UAPs along with some terminology and notation. We then introduce the threat model considered, and the datasets we use to evaluate the attack. A. Adversarial Examples Szegedy et al. [35] casts the construction of adversarial examples as an optimization problem. Given a target model, f , and a source input x, which is classified correctly by f as c, the attacker aims to find a perturbation, δ, such that x+ δ is perceptually identical to x but f(x+ δ) 6= c. The attacker tries to minimize the distance between the source image and adversarial image under an appropriate measure. The problem space can be framed to find a specific misclassification in a targeted attack, or any misclassification, referred to as an non-targeted attack. In the absence of a distance measure that accurately captures the perceptual differences between a source and adversarial image, the `p metric is usually minimized [35]. Related work commonly uses the `2 and `∞ metrics [4], [19], [3], [20], [16], [14], [10], [6], [42]. The `2 metric measures the Euclidean distance between two images, while the `∞ metric measures the largest pixel-wise difference between two images (Chebyshev distance). We follow this practice here and construct attacks optimizing under both metrics. A UAP is an adversarial perturbation that is independent of the source image. Given a target model, f , and a dataset, X , a UAP is a perturbation, δ, such that ∀x ∈ X , x+ δ is a valid input and Pr(f(x+ δ) 6= f(x)) = 1− τ , where 0 < τ << 1. B. Threat Model We consider an attacker whose goal is to craft UAPs against a target model, f . The adversarial image constructed by the attacker should be visually indistinguishable to a source image, evaluated through either the `2 or `∞ metric. Our attacks assume white-box access to f , as we backpropagate the error of the target model back to the UAN. In line with related work on UAPs [19], we consider a worst-case scenario with respect to data access, assuming that the attacker has knowledge of, and shares access to, any training data samples. We will not discuss the real-world limitations of that assumption here, but will follow that practice. C. Datasets We evaluate attacks using two popular datasets in adversarial examples research, CIFAR-10 [15] and ImageNet [29]. The CIFAR-10 dataset consists of 60,000, 32×32 RGB images of different objects in ten classes: airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck. This is split into 50,000 training images and 10,000 validation images. ar X iv :1 70 8. 05 20 7v 2 [ cs .C R ] 2 1 D ec 2 01 7

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improvement of generative adversarial networks for automatic text-to-image generation

This research is related to the use of deep learning tools and image processing technology in the automatic generation of images from text. Previous researches have used one sentence to produce images. In this research, a memory-based hierarchical model is presented that uses three different descriptions that are presented in the form of sentences to produce and improve the image. The proposed ...

متن کامل

Generative Adversarial Perturbations

In this paper, we propose novel generative models for creating adversarial examples, slightly perturbed images resembling natural images but maliciously crafted to fool pre-trained models. We present trainable deep neural networks for transforming images to adversarial perturbations. Our proposed models can produce image-agnostic and image-dependent perturbations for targeted and nontargeted at...

متن کامل

Deep Adversarial Robustness

Deep learning has recently contributed to learning state-of-the-art representations in service of various image recognition tasks. Deep learning uses cascades of many layers of nonlinear processing units for feature extraction and transformation. Recently, researchers have shown that deep learning architectures are particularly vulnerable to adversarial examples, inputs to machine learning mode...

متن کامل

Defense-gan: Protecting Classifiers against Adversarial Attacks Using Generative Models

In recent years, deep neural network approaches have been widely adopted for machine learning tasks, including classification. However, they were shown to be vulnerable to adversarial perturbations: carefully crafted small perturbations can cause misclassification of legitimate images. We propose Defense-GAN, a new framework leveraging the expressive capability of generative models to defend de...

متن کامل

Generalizable Data-free Objective for Crafting Universal Adversarial Perturbations

Machine learning models are susceptible to adversarial perturbations: small changes to input that can cause large changes in output. It is also demonstrated that there exist input-agnostic perturbations, called universal adversarial perturbations, which can change the inference of target model on most of the data samples. However, existing methods to craft universal perturbations are (i) task s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017